Tag

#speech recognition

8 articles

Qwen3.5-Omni learned to write code from spoken instructions and video without anyone training it to

Learn how to build a system that processes audio and video inputs to generate code, simulating the capabilities of multimodal AI models like Qwen3.5-Omni.

Mar 3119

Cohere releases open source model that tops speech recognition benchmarks

Cohere has released an open-source speech recognition model that outperforms industry leader OpenAI's Whisper in benchmark tests.

Mar 2729

Cohere AI Releases Cohere Transcribe: A SOTA Automatic Speech Recognition (ASR) Model Powering Enterprise Speech Intelligence

Cohere AI has released Cohere Transcribe, a state-of-the-art automatic speech recognition model designed to transform audio into actionable text for enterprise use cases.

Mar 2629

Google AI Releases WAXAL: A Multilingual African Speech Dataset for Training Automatic Speech Recognition and Text-to-Speech Models

Learn how Google's new WAXAL dataset helps improve speech technology for African languages by providing training data for AI systems.

Mar 1637

IBM AI Releases Granite 4.0 1B Speech as a Compact Multilingual Speech Model for Edge AI and Translation Pipelines

Learn how IBM's new Granite 4.0 1B Speech AI model helps computers understand and translate speech in multiple languages, even on small devices without internet access.

Mar 1531

7 surprisingly useful ways to use ChatGPT's voice mode, from a former skeptic

This explainer explores ChatGPT's Voice Mode technology, examining its multimodal architecture, real-time processing challenges, and implications for AI accessibility and reliability.

Mar 1035

tech

This AI Agent Is Ready to Serve, Mid-Phone Call

Learn to build a basic AI voice assistant that can handle phone call interactions using Python, speech recognition, and text-to-speech technologies.

Mar 252

tech

ElevenLabs and Google dominate Artificial Analysis' updated speech-to-text benchmark

Learn to implement and compare speech-to-text capabilities using Google Cloud and ElevenLabs APIs, including audio processing, transcription functions, and service evaluation.

Mar 177